Water Resources in Copenhagen during 20th and 21st century

This script visualizes the access to water resources across Copenhagen suburbs over time as a digital component for a course started in the Spring 2022, called City: Between Culture and Nature, taught by Mikkel Thelle and Mikkel Høghøj. The course surveys the gradual appearance of private and public bathing facilities, toilets and communal hygienic resources in the city of Copenhagen during the 20th century. By editing elements in this script, you can plot and explore different aspects of past and present hygienic amenities across the suburbs in the capital of Denmark.

As you are plotting you should be trying to answer the question of

Which Copenhagen suburbs are the most/least well-off in terms of hygiene, and why?

Before we start: data wrangling

First load the packages necessary for spatial data visualisation and analysis.

library(sf)
library(tidyverse)
library(spatstat)
library(spatialkernel)
library(googlesheets4)
library(leaflet)

Spatial data

Next, load your spatial data - polygons representing the suburbs of Copenhagen.

suburbs <- st_read("data/bydel.shp", options = "ENCODING=WINDOWS-1252")  # this option was added by Malte in 2022 to fix the Danish special characters on my computer
options:        ENCODING=WINDOWS-1252 
Reading layer `bydel' from data source `C:\Users\Adela\Documents\RStudio\1_Teaching\cds-prep\Week08 - Bathhouse\data\bydel.shp' using driver `ESRI Shapefile'
Simple feature collection with 10 features and 4 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 12.45305 ymin: 55.61284 xmax: 12.73425 ymax: 55.73271
geographic CRS: WGS 84
plot(suburbs$geometry)

tail(suburbs)
Simple feature collection with 6 features and 4 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 12.46344 ymin: 55.63174 xmax: 12.73425 ymax: 55.73271
geographic CRS: WGS 84
   id bydel_nr                      navn areal_m2
5   6        6                   Vanløse  6699011
6   5        5                     Valby  9234177
7   4        4 Vesterbro-Kongens Enghave  8472602
8   1        1                  Indre By 10488466
9   9        9                Amager Øst  9820410
10  2        2                  Østerbro  9858740
                         geometry
5  MULTIPOLYGON (((12.4982 55....
6  MULTIPOLYGON (((12.52434 55...
7  MULTIPOLYGON (((12.54553 55...
8  MULTIPOLYGON (((12.72897 55...
9  MULTIPOLYGON (((12.63082 55...
10 MULTIPOLYGON (((12.59777 55...
# write_rds(suburbs, 'data/CPHsuburbs.rds')

suburbs$id
 [1]  3  7  8 10  6  5  4  1  9  2

Attribute data

Next let’s bring in the attribute data. You can read the data in from a googlesheet with read_sheet() function from the googlesheets4 package, or you can use the read_csv() function to read the wc.csv provided in the data folder.

# Uncomment the lines below to read data from GDrive wc <-
# read_sheet('https://docs.google.com/spreadsheets/d/1iFvycp6M6bF8GBkGjA2Yde2yCIhiy5_slAkGF-RUF7w/edit#gid=0',
# col_types = 'cnnnnnnnn') write_csv(wc, 'data/wc.csv')
wc <- read_csv("data/wc.csv")

# Check out the data and try to grasp what is being summarized in the columns
wc
# A tibble: 100 x 9
   Suburb suburb_id flats wc_access  bath  year bath_communal_ct wc_communal_ct
   <chr>      <dbl> <dbl>     <dbl> <dbl> <dbl>            <dbl>          <dbl>
 1 Indre~         1 16280     11310  3800  1950               NA             NA
 2 Chris~         1  5490      3900   900  1950               NA             NA
 3 Voldk~         1 13460     12690  4560  1950               NA             NA
 4 Øster~         2 30820     28900 13750  1950               NA             NA
 5 Indre~         3 28700     24380  5910  1950               NA             NA
 6 Ydre ~         3 21710     20410  5800  1950               NA             NA
 7 Veste~         4 25850     23930  3730  1950               NA             NA
 8 Konge~         4  6270      6240  4240  1950               NA             NA
 9 Valby          5 14430     13970  8190  1950               NA             NA
10 Viger~         5  7700      7580  5050  1950               NA             NA
# ... with 90 more rows, and 1 more variable: hot_water <dbl>

Can you quess why the suburb_ids are repeated across multiple suburbs?

Spatial resolution adjustment - data aggregation

Data on access to hygienic facilities and other water resources in Copenhagen now looks good and tidy, but its spatial resolution is higher than the provided polygons (as in we have multiple rows that all fit within one suburb id). We therefore use the group_by() function to aggregate the data by id before we continue with any spatial operations. Given that the dataset is in fact a time-series, and each kvarter has a record for a given year or decade, we need to group first by the year and then only by id.

While aggregating the finer scale data into larger units, it is convenient to generate some statistics, such as percentages of flats that have bath and wc and hot water access within each suburb. We do this using the summarize() function below.

wcdata <- wc %>% group_by(year, suburb_id) %>% summarize(flats = sum(flats), bath = sum(bath), 
    pct_bath = bath/flats * 100, wc_access = sum(wc_access), pct_wc = wc_access/flats * 
        100, warmH20 = sum(hot_water), pct_wH20 = warmH20/flats * 100, communal_wc = sum(wc_communal_ct), 
    communal_bath = sum(bath_communal_ct))
wcdata
# A tibble: 50 x 11
# Groups:   year [5]
    year suburb_id flats  bath pct_bath wc_access pct_wc warmH20 pct_wH20
   <dbl>     <dbl> <dbl> <dbl>    <dbl>     <dbl>  <dbl>   <dbl>    <dbl>
 1  1950         1 35230  9260     26.3     27900   79.2    8530     24.2
 2  1950         2 30820 13750     44.6     28900   93.8   10750     34.9
 3  1950         3 50410 11710     23.2     44790   88.9    8810     17.5
 4  1950         4 32120  7970     24.8     30170   93.9    6110     19.0
 5  1950         5 22130 13240     59.8     21550   97.4   10830     48.9
 6  1950         6 10260  6780     66.1     10120   98.6    6270     61.1
 7  1950         7 27260 14790     54.3     26770   98.2   13280     48.7
 8  1950         8 19270 15000     77.8     18980   98.5   14690     76.2
 9  1950         9 23960 12470     52.0     22950   95.8   11210     46.8
10  1950        10 18000  9030     50.2     16200   90      7800     43.3
# ... with 40 more rows, and 2 more variables: communal_wc <dbl>,
#   communal_bath <dbl>
# write_rds(wcdata, 'data/CPH_wcdata.rds')

Questions:

  1. What percentage of flats have access to a bath in 1950 in Copenhagen as a whole?

  2. In what year is the percentage of flats with access to a WC in Copenhagen the lowest, and how much it is?

# A tibble: 5 x 4
   year flatsC  bathC pct_bath
  <dbl>  <dbl>  <dbl>    <dbl>
1  1950 269460 114000     42.3
2  1955 272480 124460     45.7
3  1965 281510 151350     53.8
4  1970 285214 151974     53.3
5  1981 272334 165960     60.9
# A tibble: 5 x 4
   year flatsC    wcC pct_wcC
  <dbl>  <dbl>  <dbl>   <dbl>
1  1950 269460 248330    92.2
2  1955 272480 270000    99.1
3  1965 281510 279970    99.5
4  1970 285214 253434    88.9
5  1981 272334 254358    93.4

Join the aggregated attribute data to their spatial representations

Now we can join the data on water resources with the spatial polygons for suburbs

wc_spatial <- suburbs %>% merge(wcdata, by.x = "id", by.y = "suburb_id")
wc_spatial
Simple feature collection with 50 features and 14 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 12.45305 ymin: 55.61284 xmax: 12.73425 ymax: 55.73271
geographic CRS: WGS 84
First 10 features:
  id bydel_nr     navn areal_m2 year flats  bath pct_bath wc_access   pct_wc
1  1        1 Indre By 10488466 1981 26413 14035 53.13671     22546 85.35948
2  1        1 Indre By 10488466 1950 35230  9260 26.28442     27900 79.19387
3  1        1 Indre By 10488466 1965 32470 12780 39.35941     32450 99.93840
4  1        1 Indre By 10488466 1970 30440 11386 37.40473     22381 73.52497
5  1        1 Indre By 10488466 1955 33490  9740 29.08331     33430 99.82084
  warmH20 pct_wH20 communal_wc communal_bath                       geometry
1      NA       NA          NA            NA MULTIPOLYGON (((12.72897 55...
2    8530 24.21232          NA            NA MULTIPOLYGON (((12.72897 55...
3      NA       NA        9510          2280 MULTIPOLYGON (((12.72897 55...
4      NA       NA          NA            NA MULTIPOLYGON (((12.72897 55...
5    9130 27.26187          NA            NA MULTIPOLYGON (((12.72897 55...
 [ reached 'max' / getOption("max.print") -- omitted 5 rows ]

Now that we have a merged spatial dataset with attributes, let’s review what attributes are available for visualisation.

# Review the column names to see what new columns you have created
names(wc_spatial)
 [1] "id"            "bydel_nr"      "navn"          "areal_m2"     
 [5] "year"          "flats"         "bath"          "pct_bath"     
 [9] "wc_access"     "pct_wc"        "warmH20"       "pct_wH20"     
[13] "communal_wc"   "communal_bath" "geometry"     

There is the suburb polygon data, such as id, bydel_nr, navn and areal_m2, and there is also the attribute data such as year, flats, bath ,etc. This gives us lots of choices for display. Lets put the data in a map.

Plot the data on the map

Let’s start by plotting one year alone, to learn how the map works.

Flats and water resources in 1950

Run the whole chunk below, and once it renders, look at the map. Afterwards, try changing the definition of what is to be displayed on line 116. For example, replace "flats" for some other column, such as "pct_bath", or "wc_access" to see how the map changes. To modify the legend, you can modify line 118 where we describe style. Replace style = "jenks" with "pretty", or "equal", or "quantile". What happens to your classification?

wc1950 <- wc_spatial %>% filter(year == 1950)

library(tmap)
tmap_mode(mode = "plot")
tm_shape(wc1950) + tm_borders(col = "black", lwd = 1) + tm_polygons("flats", id = "navn", 
    style = "jenks") + tm_legend(legend.position = c("RIGHT", "TOP")) + tm_compass(position = c("RIGHT", 
    "BOTTOM"), type = "rose", size = 2) + tm_scale_bar(position = c("RIGHT", "BOTTOM"), 
    breaks = c(0, 2, 4), text.size = 1) + tm_credits(position = c("RIGHT", "BOTTOM"), 
    text = "Adela Sobotkova, 2022") + tm_layout(main.title = "Copenhagen 1950 situation", 
    legend.outside = FALSE)

Flats through time

Now, that you have mastered visualization of a single year, let’s plot all the years we have available!

tmap_options(limits = c(facets.view = 5))  # we want to view 5 periods
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("flats", 
    id = "navn", style = "jenks") + tm_layout(main.title = "Copenhagen Flats", legend.outside = TRUE)


Lets’ look at flats per square kilometer

Now that we have a spatial object, we can create new columns, for example utilizing the shape area to calculate the density of flats per sq km.

wc_spatial <- wc_spatial %>% mutate(area_km2 = areal_m2/1e+06, flat_per_km = flats/area_km2)
library(tmap)
tmap_options(limits = c(facets.view = 5))  # we want to view 6 years
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("flat_per_km", 
    n = 5, style = "jenks")  #+


## Access to toilets and baths, per suburb and square kilometer

Lets calculate the baths and toilets available per square kilometer per each suburb

library(tmap)
tmap_options(limits = c(facets.view = 5))  # we want to view 5 years
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("pct_bath", 
    id = "navn", style = "pretty", title = "% of flats with <br> access to bath")  #+



library(tmap)
tmap_options(limits = c(facets.view = 5))  # we want to view 5 periods
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("pct_wc", 
    id = "navn", style = "pretty", title = "% of flats with <br>access to WC")


## Total baths per area

Recalculate the number of baths to total per sq kilometer

wc_spatial <- wc_spatial %>% mutate(bath_per_km = bath/area_km2, wc_per_km = wc_access/area_km2)

..or continue with communal resources and warm water

Why not practice and try plotting the flats that have access to communal baths and wc, and or hot water? Create your own map here, following the examples above.

Get additional data for Copenhagen from OpenStreetMap API

The OpenStreetMap contains free and open spatial data for physical features on the ground, with each features’ type being define using key:value pair tags. Each tag describes a geographic attribute of the feature being shown by that specific node, way or relation.

Extract OSM data

Use:

  • osmdata:opq() to define the bounding box of the osm request
  • osmdata:add_osm_feature() to define the key:value pairs you are looking for
  • osmdata:osmdata_sf() to retrieve the osm data.
library(osmdata)

# Create a bounding box
bb <- suburbs %>% st_transform(4326) %>% st_bbox()
plot(bb)

q <- opq(bbox = bb, timeout = 180)
qa <- add_osm_feature(q, key = "amenity", value = "public_bath")
# qb <- add_osm_feature(q, key = 'amenity',value = 'drinking_water')
qc <- add_osm_feature(q, key = "amenity", value = "shower")
qd <- add_osm_feature(q, key = "amenity", value = "toilets")
# qe <- add_osm_feature(q, key = 'amenity',value = 'water_point')
public_bath <- c(osmdata_sf(qa), osmdata_sf(qc), osmdata_sf(qd))

Clean up OSM data

Use the following code to clean the results and project them in Danish UTM.

This code:

  • removes the duplicated geometries thanks to osmdata::unique_osmdata (see the documentation for details)
  • projects into WGC84 UTM32
  • keeps the name attribute only
  • computes the centroids for the baths stored as polygons
  • Eventually, the baths outside our CPH suburbs are removed.
library(osmdata)
bath_uniq <- unique_osmdata(public_bath)

rpoint <- bath_uniq$osm_points %>% filter(!is.na(amenity)) %>% st_transform(32632) %>% 
    dplyr::select(name)

rpoly <- bath_uniq$osm_polygons %>% st_transform(32632) %>% dplyr::select(name) %>% 
    st_centroid()

baths_osm <- rbind(rpoly, rpoint)

baths_osm <- st_intersection(baths_osm, st_transform(suburbs, 32632) %>% st_geometry() %>% 
    st_union())

# transform also historical baths
baths_cph <- wc_spatial %>% st_centroid() %>% st_transform(32632) %>% mutate(radius = sqrt(bath_per_km)) %>% 
    arrange(desc(bath_per_km))

Display two maps side-by-side

Now, let’s display the results in two synchronized mapview maps:

  • one with bathing resources in suburbs
  • another one with baths extracted from OSM.
  • Use the mapview::sync function to display both maps side by side with synchronisation.
library(mapview)
# library(leafsync) library(leaflet)
map_osm <- mapview(baths_osm, map.types = "OpenStreetMap", col.regions = "#940000", 
    label = as.character(suburbs$name), color = "white", legend = FALSE, layer.name = "Baths in OSM", 
    homebutton = FALSE, lwd = 0.5)


# test map
mapview(baths_cph[, -3], map.types = "Stamen.TonerLite", cex = "radius", legend = FALSE, 
    col.regions = "#217844", lwd = 0, alpha = 0.4)
map_cph <- mapview(baths_cph[, -3], map.types = "OpenStreetMap", col.regions = "#940000", 
    color = "white", cex = "bath_per_km", legend = TRUE, layer.name = "Baths per sq km <br>in suburbs from 1970", 
    homebutton = FALSE, lwd = 0.5)


sync(map_osm, map_cph)

What a fantastic synced map! Two maps show entirely different datasets (OSM baths in 2021 versus historical data from suburbs) moving interactively. The synced map functionality is nice, but the comparison does not make much sense: OSM public bathroom points versus private bathing facilities originating from suburb polygons are not exactly comparable.

Question:

  1. Why is the comparison of bathroom points and suburb data meaningless and how can we improve it?

Improve the display with some comparable dataset

It might be better to combine the OSM data with the public bathhouse data that we had looked at previously in Leaflet.

We need to

  • load the data from googlespreadsheet
  • filter out missing coordinates and convert to sf object
  • project to WGS84 UTM 32
# baths <- read_sheet("https://docs.google.com/spreadsheets/d/15i17dqdsRYv6tdboZIlxTmhdcaN-JtgySMXIXwb5WfE/edit#gid=0",
#                     col_types = "ccnnncnnnc")
# write_rds(baths,"data/baths.rds")
baths <- read_rds("data/baths.rds")
names(baths)
 [1] "BathhouseName"          "Coordinates"            "Longitude"             
 [4] "Latitude"               "Quality"                "AlternativeCoordinates"
 [7] "Long"                   "Lat"                    "ErrorRadius_m"         
[10] "Notes"                 
hist_bathhouses <- baths %>% 
  dplyr::select(BathhouseName,Longitude,Latitude,Quality) %>% 
  filter(!is.na(Longitude)) %>% 
  st_as_sf(coords=c("Longitude", "Latitude"), crs = 4326)

hist_baths <- st_transform(hist_bathhouses, crs=32632)

#test map
library(mapview)
mapview(hist_baths, map.types = "Stamen.TonerLite",
        #cex="radius", legend=FALSE,
        col.regions="#217844", lwd=0, alpha=0.4)

Now, let’s load this projected historical bathouse object in the synced map so we can compare the locations with OSM data.

library(mapview)
map_osm <-  mapview(baths_osm, map.types = "OpenStreetMap", 
        col.regions = "#940000", 
        label = as.character(suburbs$name), 
        color = "white", legend = FALSE, layer.name = "Baths in OSM",
        homebutton = FALSE, lwd = 0.5) 

map_hist <-  mapview(hist_baths, 
          map.types = "OpenStreetMap", 
        col.regions = "#940000", 
        color = "white", 
       # cex = "bath_per_km",
        legend = TRUE, 
        layer.name = "Public bathhouses, early 20th century",
        homebutton = FALSE, lwd = 0.5) 

sync(map_osm,map_hist)


Lovely comparison of apples and apples (more or less). Basically we have two different point patterns, showing current public baths and toilets in Copenhagen and historical ones from early 20th century. The city has grown, hygienic standards have risen, and so clearly have the hygienic facilities. While you can see some spatial differences and may postulate the reason for increased density of points, how would you assess it formally?
In the next section, next wee, you will learn how to formally evaluate the (dis)similarity of spatial patterning between the historical and current data.

Comparing two point patterns. How do we best do it?

We have two patterns, historical and OSM data. Are they similar or dissimilar? How do the patterns of historical and current public bathhouses compare beyond a quick eyeball evaluation?

Here we might be able to use some statistical functions that contrast nearest neighbor distances or multi-distance clustering across the two groups.

We will learn how to do that next week!